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Abstract 

Consider an rax N matrix $ with the Restricted Isometry Property of order k and level S, that is, the 
O . norm of any fc-sparse vector in 1* is preserved to within a multiplicative factor of 1±<5 under application 

1} ' of We show that by randomizing the column signs of such a matrix the resulting map with high 

probability embeds any fixed set of p = 0(e k ) points in R w into M. m without distorting the norm of any 
point in the set by more than a factor of 1 ± AS. Consequently, matrices with the Restricted Isometry 
Property and with randomized column signs provide optimal Johnson-Lindcnstrauss embeddings up to 
logarithmic factors in N. In particular, our results improve the best known bounds on the necessary 
embedding dimension m for a wide class of structured random matrices; for partial Fourier and partial 
Hadamard matrices, we improve the recent bound m > log(p) log 4 (iV) appearing in Ailon and 
Liberty [3] to m > S~ 2 log(p) log 4 (TV), which is optimal up to the logarithmic factors in N. Our results 
also have a direct application in the area of compressed sensing for redundant dictionaries. 



> 1 1 Introduction 

^t- : 

The Johns on- Lindenstrauss (JL) Lemma states that any set of p points in high dimensional Euclidean 
space can be embedded into 0(e~ 2 log(p)) dimensions, without distorting the distance between any two 
points by more than a factor between 1 — e and 1 + e. In its original form, the Johnson-Lindcnstrauss 
\ Lemma reads as follows. 

Theorem 1.1 (Johnson-Lindenstrauss Lemma [IE])- Let e <E (0, 1) and let x\, x p <G M. N be arbitrary 
points. Let m = 0(e~ 2 \og(p)) be a natural number. Then there exists a Lipschitz map f : M. N — > M. m 
such that 



- r— I 

X 



(1 - e)\\x z - 3,-11! < \\f( Xi ) - f(x 3 )\\ 2 2 < (1 + e)\\xi - Xj \\ 2 (1) 

for all i,j £ {1, 2, ...,p}. Here || • H2 stands for the Euclidean norm in M. N or M. m , respectively. 

As shown in [S], the bound for the size of to is tight up to an 0(log(l/e)) factor. In the original paper 
of Johnson and Lindenstrauss, it was shown that a random orthogonal projection, suitably normalized, 
provides such an embedding with high probability [28]. Later, this property was also verified for 
Gaussian random matrices, among other random matrix constructions |21[ I15j . As a consequence, the 
JL Lemma has become a valuable tool for dimensionality reduction in a myriad of applications ranging 
from computer science , numerical linear algebra HS1 HE] , manifold learning [5] , and compressed 
sensing [7], gD], [TO]. 

In most of these frameworks, the map / under consideration is a linear map represented by an 
to x N matrix $. In this case, one can consider the set of differences E = {xi — Xj}; to prove the 
theorem, one then needs to show that 

0--e)\\v\\l<\\*v\\l<(l + e)\\ y \\l for all yeS. (2) 
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When $ is a random matrix, the proof that $ satisfies the JL lemma with high probability boils down 
to showing a concentration inequality of the type 

P((l-e)||a:||jj < \\®x\\l < (1 + > 1 - 2 cxp(-c £ 2 m), (3) 

for an arbitrary fixed x £ M. N , where cq is an absolute constant in the optimal case, and in addition 
possibly mildly dependent on TV in almost-optimal scenarios as for example in [3]. Indeed it directly 
follows by a union bound over E (as in the proof of Theorem 13.11 below) that @ holds with high 
probability. 

In order to reduce storage space and implementation time of such embeddings, the design of struc- 
tured random JL embeddings has been an active area of research in recent years [U [37J 131 HI] ; see [3] or 
[29] for a good overview of these efforts. Of particular importance in this context is whether fast (i.e. 
0(TVlog(TV))) multiplication algorithms are available for the resulting matrices. Fast JL embeddings 
with optimal embedding dimension m = 0(e~ 2 log(p)) were first constructed by Ailon and Chazelle 
PQ, but their embeddings are fast only for p < e N 1 vectors. This restriction on the number of vec- 
tors was later weakened to p < e N [2]. In [3], fast JL embeddings were constructed without any 
restrictions on the number of vectors, but the authors only provide sub-optimal embedding dimension 
m = 0(e~ 4 log(p) log 4 (TV)). In this paper, we provide the first unrestricted fast JL construction with 
optimal embedding dimension up to logarithmic factors in TV. Note that in the range p > e N 1 not 
covered by the constructions in [3 [5], a logarithmic factor in TV is bounded by log(log(p)), and thus 
plays a minor role. 

The Johnson-Lindenstrauss Lemma in Compressed Sensing. One of the more recent 
applications of the Johnson-Lindenstrauss Lemma is to the area of compressed sensing, which is centered 
around the following phenomenon: For many underdetermined systems of linear equations $a; = y, 
the solution of minimal fi-norm is also the sparsest solution. To be precise, a vector x e R N is k- 
sparse if \{j : \xj \ > 0}| < k. A by now classical sufficient condition on the matrix $ for guaranteeing 
equivalence between the minimal i\ norm solution and sparsest solution is the so-called Restricted 
Isometry Property (RIP) [TT1 IT51 [T7] . 

Definition 1.2. A matrix 3> £ R mxAr j s S aid to have the Restricted Isometry Property of order k and 
level8 G (0,1) (equivalently, (k,S)-RIP) if 

(1 - S)\\x\\l < \\<f>x\\l < (1 + S)\\x\\l for all k-sparse x e R N . (4) 

The restricted isometry constant 8k is defined as the smallest value of 6 for which ([4]) holds. 

In particular, if $ has (2k, f)2fc)-RIP with S 2 k < 2/ (3 + y|) ~ .4627, and if y = admits a /{-sparse 

solution x&, then x& = argmin$2 =1) ||z||i [19j . 

Gaussian and Bernoulli random matrices have (k, <5)-RIP with high probability, if the embedding 
dimension m > S~ 2 k log(TV/fc) [Jj. Up to the constant, lower bounds for Gclfand widths of ^i-balls 
[22] [20] show that this dependence on TV and in k is optimal. The Restricted Isometry Property also 
holds for a rich class of structured random matrices, where usually the best known bounds for m have 
additional log factors in TV. All known deterministic constructions of RIP matrices require that m> k 2 
or at least m > fc 2_AI for some small constant p > [9]. 

The similarity between the expressions in ([2]) and ([4]) suggests a connection between the JL lemma 
and the Restricted Isometry Property. A first result in this direction was established in [7] , wherein it 
was shown that random matrices satisfying a concentration inequality of type ([3]) (and hence the JL 
Lemma) satisfy the RIP of optimal order. More precisely, the authors prove the following theorem. 

Theorem 1.3 (Theorem 5.2 in [7]). Suppose that m,N, and < 5 < 1 are given. If the probability 
distribution generating the m x TV matrices $ satisfies the concentration inequality ([3]) with e = S and 
absolute constant Cq, then there exist absolute constants C\,C2 such that with probability > 1 — 2e _C2<5 m , 
the RIP ^) holds for $ with the prescribed 5 and any k < ci5 2 m/log(N/k). 

In this sense, the JL Lemma implies the Restricted Isometry Property. 
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Contribution of this work. We prove a converse result to Theorem 11.31 Wc show that RIP 
matrices, with randomized column signs, provide Johnson-Lindenstrauss embeddings that are optimal 
up to logarithmic factors in the ambient dimension. In particular, RIP matrices of optimal order 
provide Johnson-Lindenstrauss embeddings of optimal order as such, up to a logarithmic factor in TV 
(see Theorem 13. ip . Note that without randomization, such a converse is impossible as vectors in the 
null space of the fixed parent matrix are always mapped to zero. 

This observation has several consequences in the area of compressed sensing, and also allows us 
to obtain improved JL embedding results for several matrix constructions with existing RIP bounds 
[13l|35 ( 31, 38) 33 ]. Of particular interest is the random partial Fourier or the random partial Hadamard 
matrix, which is formed by choosing a random subset of m rows from the N x N discrete Fourier or 
Hadamard matrix respectively, and with high probability has (fc, <5)-RIP if the embedding dimension 
m > S~ 2 klog 4 (N). For these matrices with randomized column signs, the running time for matrix- 
vector multiplication is 0(N log(N)) as opposed to the running time of 0(Nm) for purely random ma- 
trices. For such constructions, the previous best-known embedding dimension to ensure that ([2]) holds 
with probability 1 — 77, given by Ailon and Liberty [3], is m x e - 4 log(p/77)log 4 (7V). We can improve 
their result to have optimal dependence on the distortion, e, showing that m x e~ 2 \og{p/w) \og i {N) 
rows suffice for the embedding. 

This paper is structured as follows: Section [5] introduces necessary notation. In Section [31 wc state 
our main results, and Section @] gives concrete examples of how these results improve on the best-known 
JL bounds for several matrix constructions as well as applications of our findings in compressed sensing. 
In Section [5] we give the relevant concentration inequalities and explicit RIP-based matrix inequalities 
that are needed for the proofs, which are then carried out in Section [6] 

2 Notation 

Before continuing, let us fix some notation to be used in the remainder. For N £ N, we denote 
[N] = {1, . . . , N}. The ^p-norm of a vector x = (x\, . . . , x^) £ M. N is defined as 

N 

3=1 

and II^Hoo = maxj^i,...,^ \xj\ as usual. For a matrix $ = (Qj.e) £ M. mxN , its operator norm is 
||<I>|| := supp^^ ||$x||2, and its Frobenius norm is defined by 

m N 

ll^:=(EEl<M 2 ) 1/2 - 

j=i i=\ 

For two functions /, g : S — > K. + , S an arbitrary set, we write / > g if there is a constant C > such 
that f(x) > Cg(x) for all x £ S; we write / x g if / > g and g > f. Let N and s <C N be given and 
set R = ] . For given x = (x\, . . . , xn) G R n , we say that x is in decreasing arrangement, if one has 
\xi\ > \xj \ for i < j. For vectors in decreasing arrangement, we decompose x = (x(i) , ■ • ■ , £(j) , ■ • ■ , a^(_R)) 
into blocks of size s = k/2, i.e. € K s ; the last block x^ is potentially of smaller size. We will 
also consider the coarse decomposition x = (x(u,X(y\), where x^ = (x(o), ...,xrm) G IR^" 8 . Denote 
by ]L[ the indices corresponding to the L-th block. For j, I £ [N] we write j ~ Z if the two indices are 
associated to the same block, and we write j no £ otherwise. Given a matrix $ £ M. mxN , write $j to 
denote the j-th column, £ K mxs to denote the matrix that is the restriction of $ to the s columns 
indexed by J (again with the obvious modification for J = R), and <&n,\ to denote the restriction of $ 
to all but the first k columns. Finally, for a vector x £ M. N , we denote by D x = (Dij) £ M. NxN the 
diagonal matrix satisfying Djj = Xj . 

3 The main results 

Theorem 3.1. Fix r\ > and e £ (0, 1), and consider a finite set E C R N of cardinality \E\ = p. Set 
k > 40 log ^, and suppose that $ £ K rnxJV satisfies the Restricted Isometry Property of order k and 
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level 6 < §. £ei £ G R-^ &e a Rademacher sequence, i.e., uniformly distributed on {— 1, 1} N . Then with 
probability exceeding 1 — r\, 

(l-e)\\xf 2 <\\^x\\ 2 <(l + e)\\x\\ 2 (5) 

uniformly for all x G E. 

Along the way, our method provides a direct converse to Theorem 1 1.31 

Proposition 3.2. Fix e G (0, 1), and suppose that there is a constant C3 such that for all pairs (k,m) 
that are admissible in the sense that k < c 3 S 2 m/log(N/k), $ = $(m) G R mxN has the Restricted 
Isometry Property of order k and level 8 < § . Fix x€R N and let £ G be a Rademacher sequence, 
i.e., uniformly distributed on {—1,1}^. Then there exists a constant C4 such that for all to, 
satisfies the concentration inequality ([3]) for c = C4log~ 1 where k is any integer such that (k,m) 

is admissible. 

4 Concrete examples and applications 

Using Theorem 13.11 we can improve on the best Johnson-Lindcnstrauss bounds for several matrix 
constructions that are known to have the Restricted Isometry Property: 

1. Matrices arising from bounded ortho normal systems. Consider an orthonormal sys- 
tem of real- valued functions tpj , j G [N] , on a measurable space S with respect to an orthogonalization 
measure dv. Such systems are called bounded orthonormal systems if sup,^^] su PxgS I^j ( x )\ ^ K 
for some constant K > 1. We may associate to such a system the m x N matrix $ with entries 
$^ j = -j={pj(xg), where xi, I G [to], are drawn independently according to the orthogonalization 
measure dv. As shown in [13, 35, 31!, matrices arising as such have (k, 5)-RIP with high probability if 
to > S~ 2 k log 4 (TV). By Theorem 13. 1[ these embeddings with randomized column signs satisfy the JL 
Lemma for to > e~ 2 log(p) log(TV), which is optimal up to the log(iV) factors^ 

For measures with discrete support, such constructions are equivalent to choosing m rows at random 
from an N x N matrix with orthonormal rows and uniformly bounded entries. Examples include the 
random partial Fourier matrix or random partial Hadamard matrix, formed from the discrete Fourier 
matrix or discrete Hadamard matrix respectively. (In the Fourier case, we distribute the resulting real 
and complex parts in different coordinates, inducing an additional factor of 2.) Note that the structure 
of these matrices allows for fast matrix vector multiplication. Recently, Ailon and Liberty [3] verified 
the JL Lemma for such constructions, with column signs randomized, when to > e~ 4 log(p) log 4 (iV). 
Our result improves the factor of e~ 4 in their result to the optimal dependence e~ 2 . We note that 
while their proof also uses the RIP, it also requires arguments from |35j that are specific to discrete 
bounded orthonormal systems. 

Examples of bounded orthonormal systems connected to continuous measures include the trigono- 
metric polynomials and Chebyshev polynomials, which are orthogonal with respect to the uniform 
and Chebyshev measures, respectively. The Legendre system, while not uniformly bounded, can still 
be transformed via preconditioning to a bounded orthonormal system with respect to the Chebyshev 
measure [33]. Note that all of these constructions have an associated fast transform. 

2. Partial circulant matrices. Other classes of structured random matrices known to have the 
RIP include partial circulant matrices [3H [30[ [32]. In one such set-up, the first row of the N x N 
matrix is a Gaussian or Rademacher random vector, and each subsequent row is created by rotating 
one element to the right relative to the preceding row vector. Again, to rows of this matrix arc sampled, 
but in contrast to partial Fourier or Hadamard matrices, the selection need not be random. Using that 
convolution corresponds to multiplication in the Fourier domain, these matrices have associated fast 
matrix- vector multiplication routines. In [32], such matrices were shown to have the RIP with high 

probability for m > max log^ (iV), 8~ 2 k log 4 (A) ) . 



1 Actually, the bounds in [31] yield that m > <5 -2 fclog 3 (fc) log 2 (iV) is sufficient for $ to have (k, <5)-RIP with high probability. 
Hence $D 5 is a JL-embedding for m > e~ 2 log(p) log 3 (log(p)) log(JV). However, in order to work with simpler expressions, 
we bound k < N in the logarithmic factors. 
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On the other hand, such a matrix composed with a diagonal matrix of random signs was shown to 
be a JL embedding with high probability as long as m > e~ 2 \og 2 (p) Hi]. Through Theorem the 

same results also obtain if m > max f £ 1 l°g 3 ^ 2 (if) 1°§ 5 C^Oj e_2 1°S (^f ) 1°§ 4 (^0) ■ For l ar g c Pi this 
is an improvement compared to |39j . 

3. Deterministic constructions. Several deterministic constructions of RIP matrices are 
known, including a recent result in [9] that requires only m > k 2 ~^. We refer the reader to the exposi- 
tion in [9] for a good overview in this direction; we highlight two such deterministic constructions here. 
Using finite fields, DeVore [16] provides deterministic constructs of cyclic 0-1-valued matrices with 
(fc,5)-RIP with m > 5~ 2 k 2 log 2 (TV). Iwen [37] provides deterministic constructions of 0-1-valucd ma- 
trices whose number theoretic properties allow their products with Discrete Fourier Transform (DFT) 
matrices to be well approximated using a few highly sparse matrix multiplications. Both the binary- 
valued matrices and their products with the DFT yield (k, 5) -RIP matrices with m > <5 _2 fc 2 log 2 (iV). 
By Theorem 13. 1| the class of matrices that results by randomizing the column signs of cither of these 
deterministic constructions satisfies the JL Lemma with m > e~ 2 log 2 (p) log 2 (iV). 

Note that the amount of randomness needed to construct such embeddings is still comparable to 
the first two examples, requiring N random bits. Under the model assumption that the entries of each 
vector i £ £ to be embedded has random signs, however, the required randomness in the matrix is 
removed completely. 

In addition to their fast multiplication properties, these examples have the advantage in that the 
construction of the matrix embedding only uses N + m, 2N + m, and N independent random bits, 
respectively, compared to mN bits for matrices with independent entries. We note that stronger 
embedding results are known with fewer bits, if one imposes restrictions on the norm of the vectors 
x G E to be embedded - see [3§] and [T3] . 

For each of the aforementioned examples, we summarize the number of dimensions m that are known 
to be sufficient (k, <$)-RIP to hold. We also list the previously best known bound for JL embedding 
dimension (if there is one) along with the JL bounds obtained from Theorem 13. II Where Theorem 13. II 
yields a better bound than previously known, at least for some range of parameters, we highlight the 
result in bold face. In each of the bounds, we list only the dependence on 6, k, and N, or e, k, and N, 
omitting absolute constants. 





RIP bounds 


Previous JL Bound 


JL Bound from Theorem 13.11 


Partial Fourier 


(T 2 Hog 4 (A0 


e- 4 log(£)log 4 (iV) 


s- 2 log(£)log 4 (AT) 


Partial Circulant 


max ((5~ 1 A;5 log2(jV), 
(T 2 Hog 4 (AO) 


£- 2 log 2 (2) 


max (e _1 log2(H) log2(jV), 
£ - 2 log(2)log 4 (iV)) 


Deterministic 
(DeVore, Iwen) 


S~ 2 k 2 log 2 (iV) 




e- 2 log 2 (fpiog 2 (iV) 


Subgaussian 


5- 2 fclog(f) 




£ - 2 log(£)log(A0 



4. Compressed sensing in redundant dictionaries. As shown recently in [TU], concentra- 
tion inequalities of type ^ allow for the extension of the compressed sensing methodology to redundant 
dictionaries - in particular, tight frames - as opposed to orthonormal bases only. Since signals with 
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sparse representations in redundant dictionaries comprise a much more realistic model of nature, this 
extension of compressed sensing is fundamental. Our results show that basically all random matrix 
constructions arising in the standard theory of compressed sensing (i.e., based on RIP estimates) also 
yield compressed sensing matrices for the redundant framework. 

5. Compressed sensing with cross validation. Compressed sensing algorithms are designed 
to recover approximately sparse signals; if this assumption is violated, they may yield solutions far from 
the input signal. In [40], a method of cross validation is introduced to detect such situations, and to 
obtain tight bounds on the error incurred by compressed sensing reconstruction algorithms in general. 
There, a subset y\ = <&\X of the m measurements y = <&x are held out from the reconstruction algorithm 
and only the remaining measurements y% = $2% are used to produce a candidate approximation x to the 
unknown x. If the hold-out matrix $! satisfies the Johnson-Lindenstrauss Lemma, then the observable 
quantity — 5?) 1 1 2 can be used as a reliable proxy for the unknown error ||x — 2- Our work shows 
that any RIP matrix as in the standard compressed sensing framework can be used for cross validation 
up to a randomization of its column signs. 

6. Optimal asymptotics in 5 for RIP to hold. As mentioned above, it can be shown using a 
Gclfand width argument that m x k log(-^) is the optimal asymptotics (in TV and k) of the embedding 
dimension for a matrix with the restricted isomctry property Our results - combined with the 
known optimality of the asymptotics m = e~ 2 log(p) for the embedding dimension in the Johnson- 
Lindenstrauss Lemma (jl.ip - imply that up to a factor of log (j-), m x S~ 2 is the optimal asymptotics 
in the restricted isometry constant 6 for fixed N and k as 8 — > 0. Recall that this rate is realized by 
many of the above examples, such as Gaussian random matrices. 

5 Proof Ingredients 

The proof of Theorem 13.11 relies on concentration inequalities for Radcmachcr sequences and explicit 
RIP-based norm estimates. The first concentration result is a classical inequality by Hocffding [25] . 

Proposition 5.1 (Hoeffding's Inequality). Let x <G Mr, and let £ = (£j)jLi be a Rademacher sequence. 
Then, for any t > 0, 

P(ip^|>*)<2exp(-^). (6) 

The second concentration of measure result is a deviation bound for Rademacher chaos. There are 
many such bounds in the literature; the following inequality dates back to [24] . but appeared with 
explicit constants and with a much simplified proof as Theorem 17 in [8]. 

Proposition 5.2. Let X be the N x N matrix with entries Xij and assume that Xi^i = for all i G [N]. 
Let £ = (£j)jLi be a Rademacher sequence. Then, for any t > 0, 

F (I £ ^ I > *)< ^ exp ( - ± nun ( J|, ]j|jr) ) ■ 00 

ij II II II HJ- 

We also need the following basic estimate for RIP matrices (see for instance Proposition 2.5 in [31]). 

Proposition 5.3. Suppose that $ € W nxN has the Restricted Lsometry Property of order 2s and level 
5. Then for any two disjoint subsets J, L C [N] of size \ J\ < s, \L\ < s, 

ll*(J)*(L)ll <S- 

The proof of our norm estimate for RIP-matrices uses Proposition [5T3l and relies on the observation 
commonly used in the theory of compressed sensing (see for example |12j ) that for z in decreasing 
arrangement and ||z||2 = 1, for J > 2 one has ||z(j)||oo < -75ll 2 (j-i)||2 and thus ||z(t,)||oo < l/v 7 *- 
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Proposition 5.4. Let R = [N/s]. Let $ = ($_,•) = ($ (1) , $ (2) , = ($ (1) ,$ (b) ) e W nxN have 
the (2s, S) -Restricted Isometry Property, let x = (xj) = (xn^xm, ... = (xm,xn,\) <G M. N be in 
decreasing arrangement with \\x\\2 < 1, and consider the symmetric matrix 

' J ' \ 0, eke, 

and, for b G { — 1, 1} S , the vector 

veR N , v = D Xw ^ {l) D X{1) b. 
The following bounds hold: \\C\\ < § , ||C||^ < ^, and ||«|| 2 < ^j- 
Proof. 

\\C\\= sup |(y,Cv>| 

ll'y||2=i 

< sup ^ Kvw.-Dx^^Sw-D^jtffL)' 

J.L=2 

< Sup ]T ||tf(J)|| 2 ||l/(L)||2||I>x (J) *(j)*(£)-Dx (i) || 

ii 

< Sup ^ HyujIbllj/CLjIkll^jjIloollxfLjIlooi (8) 

IMl2 = 1 J,L=2 
R 



< sup llywll2||2/(L)ll2^=||a;(j-i)||2^=||a;( L _i)||2^ 

Il«/ll2 = l 7 r_o V s V s 



J,L=2 
R 



^„ sup J E Qll^-i)ll2 + ^llywlli) (^ll^-i)lli + ^l|y(£)lli) (9) 



5 

<-. 

s 



To obtain we use the inequality of arithmetic and geometric means; to obtain ([5]). we use Propo- 
sition 15.31 
Similarly, 

R 

\\V\\2< SUp £(l/(L),££ w *(i)*(l)^(6)«(l)) 

n 

< SUp ^ ||y ( L)l|2||2;(L)l|oo||$( L) $(l)||||&||oo||a;(l)||2 
Il2/I|2 = 1 L=2 



< sup ^||y (L) || 2 _||x (L _ 1) || 2 ||$^ ) <i> (1) ||||6!| 
\\vh=i L=2 vs 

^ SUp E fillvwlla + ill»(i-i)lll) 
V s IHl2=i7-^ \2 2 / 



Il2 =i £=2 

s 



<- 
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For the Frobenius norm, we estimate: 

N 



j,l=s+l 
R N 

=E E 

L=2j=s+1 
R N 

=E E ^wd^U^w 2 

L=2j=s+1 
R N 

<E E ^ii^O^i 12 

L=2 j=s+l 



<5 2 



^EtII^-d^E^ 2 

<5 2 



L=2 S j=l 



<- 

S 



□ 



6 Proof of the main results 

We begin by proving Theorem [O] Without loss of generality, we assume that all x <G E are normalized 
so that ||.t|| 2 = 1. Furthermore, assume that k = 2s is even. 

We first consider a fixed x £ E, eventually taking a union bound over all x. We further assume that 
x is in decreasing arrangement. To achieve this, we reorder the entries of x, and permute the columns 
of $ accordingly. This has no impact on the following estimates, as the Restricted Isometry Property 
of the matrix $ is invariant under permutations of its columns. We need to estimate 

ii^fd^!! 2 = nielli 

R R 

(10) 



J=l J,L=2 



We will bound the terms separately. 

1. As $ has the Restricted Isometry Property of order k > s and level 6, it also has the RIP of 
order s and level 5, and each is almost an isometry. Hence, noting that \\D X{ 7) £(J) H2 = 
\\D^ { ,)£(,/) H2 = H27 the first term can be estimated as follows. 



(1 - 6)\\x\\l < E \\* { j)D XiJ) Z {J) \\l < (l + 5)\\x\ 



j=i 
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Thus, using that 5 < e/4, 

(1 - 9 Nis<f; 11^)^^111 <(i 

2. To estimate the second term, fix =: b and consider the random variable 

X = b*D X{1) $* {1) <S> ib) D X(b) Z {b) = (v,£ w ) 



8 



with v as in Proposition 15.41 By Hocffding's inequality (Proposition 15 . l| ) combined with Propo- 
sition 15.41 



P(m>7£)<2exp(-^JL). 
Taking a union bound, one obtains: 



' (3x £ E : \X\ > je) < cxp logp + log 2 



In order for this probability to be less than ry/2, we need: 

2 9 

log2 P -^<logi 

that is, 



7 2 S £ 2 

25 2 



(11) 



S < e/4- 



8 7 2 s 



log (4p/?y) 



(12) 



3. We can rewrite the third term as 

R 



N 



J,L=2 j,t=a+l 

where C € M. NxN is the matrix as in Proposition 15.41 By Proposition 15.41 we have ||C|| < - and 
s 



\C\\jr < -£= , hence by Proposition 



N 

j,e=s+i 



Using a union bound, one obtains: 

JV 



1 fsr 2 e 2 96rse\ 
>re\ <2exp(--nun^,— j 



3x e E 



€j£t C jl 

j,t=s+l 



In order for this probability to be less than rj/2, we need: 

1 /r 2 e 2 96te\ 

log2 P -- S min^— j<log(,/2), 

that is, 



(13) 



1 (r 2 e 2 96te\\ 

>te I <2cxp(logp- -s min _ j j . 



5 < — min 
~ 4 




(14) 



By assumption, S < |, so conditions (fT2"j) and (fT4"]l are satisfied by setting r = .55,7 = •!) an d 
s > 20 log (4p/?7) (that is, k = 2s > 40 log (Ap/rf)). Then the second term is bounded by .26 in absolute 
value, and the last term is bounded by .555. Together with the deterministic RIP-based estimate for 
the first term, this implies the Theorem. □ 
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Proof of Proposition 13.21 Fix e > 0, and suppose that there is a constant C3 such that for all 
pairs (k,m) with k < cj,8 2 m/ \og{N/k), $ = $(m) £ M mxAr has the Restricted Isometry Property of 
order k and level 5 = |. Now let (k,m) be admissible. An elementary monotonicity argument shows 
that there exists k' > k such that (k',m) is admissible and k' > ^c^5 2 m/ \og(N/k'). Fix x <E M. N and 
let £ € R N be a Radcmacher sequence. Then, for any fixed vector x S l w , the estimates in equations 
and (|13[) with parameters r = .55 and 7 = .1 imply the existence of a constant C5 < 1 for which 



\*Dsx\\l- 



> EI 



where C4 = C5C3/32. 



< 2exp(-c 5 fc') 

< 2exp(-c 4 £ 2 TOlog _1 (A^/fc')) 



(15) 
□ 



Remarks: Although we have stated the main result for the setting x £ and $ G R mxJV , all of 
the analysis holds also in the complex setting, x £ C N and <f> 6 C ,nxN . 

As shown in [7J, a random matrix $ whose entries follow a subgaussian distribution is known to 
have with high probability the Restricted Isometry Property of best possible order, that is, one can 
choose m >c <5~ 2 fc log . When k > 40 log J, $ is a JL embedding by Theorem 13.11 and our 
resulting bound for m is optimal up to a single logarithmic factor in N. This shows that Theorem 13. II 
must also be optimal up to a single logarithmic factor in N. 
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